expression_matrix <- readRDS(url("https://depts.washington.edu:/trapnell-lab/software/monocle3/celegans/data/packer_embryo_expression.rds"))
cell_metadata <- readRDS(url("https://depts.washington.edu:/trapnell-lab/software/monocle3/celegans/data/packer_embryo_colData.rds"))
gene_annotation <- readRDS(url("https://depts.washington.edu:/trapnell-lab/software/monocle3/celegans/data/packer_embryo_rowData.rds"))
cds <- new_cell_data_set(expression_matrix,
cell_metadata = cell_metadata,
gene_metadata = gene_annotation)
This works the same as in clustering analysis, this time using a different strategy for batch correction.
cds <- preprocess_cds(cds, num_dim = 50)
cds <- align_cds(cds, alignment_group = "batch", residual_model_formula_str = "~ bg.300.loading + bg.400.loading + bg.500.1.loading + bg.500.2.loading + bg.r17.loading + bg.b01.loading + bg.b02.loading")
In addition to using the alignment_group arg to
align_cds(), which aligns groups of cells (i.e. batches),
also using residual_model_formual_str, for subtracting
continuous effects. Each of the cols bg.300.loading,
bg.400.loading, corresponds to a background signal that a
cell might be contaminated with. Passing these cols as terms in the
residual_model_formuala_str tells align_cds()
to subtract these signals prior to dimensionality reduction, clustering,
and trajectory inference.
Use UMAP for dimensionality reduction (default method).
cds <- reduce_dimension(cds)
plot_cells(cds, label_groups_by_cluster = FALSE, color_cells_by = "cell.type")
Use plot_cells() to visualize how individual genes vary
along the trajectory. Look at some genes with interesting patterns of
expression in ciliated neurons.
ciliated_genes <- c("che-1",
"hlh-17",
"nhr-6",
"dmd-6",
"ceh-36",
"ham-1")
plot_cells(cds,
genes = ciliated_genes,
label_cell_groups = FALSE,
show_trajectory_graph = FALSE)
Monocle is able to learn when cells should be placed in the same
trajecotry as opposed to seperate trajectories through its clustering
procedure. With cluster_cells(), each cell is assigned not
only to a cluster but also to a partition. When learning trajectories,
each partition will eventually become a separate trajectory.
cds <- cluster_cells(cds)
plot_cells(cds, color_cells_by = "partition")
Fit a principal graph within each partition using
learn_graph().
cds <- learn_graph(cds)
##
|
| | 0%
|
|======================================================================| 100%
plot_cells(cds,
color_cells_by = "cell.type",
label_groups_by_cluster = FALSE,
label_leaves = FALSE,
label_branch_points = FALSE)
This graph will be used in many downstream steps, such as branch analysis and differential expression.
Once the graph has been learned, order the cells according to their progress through the developmental program, measuring this progress in pseudotime.
In order to place the cells in order, tell Monocle where the “beginning” of the biological process is. Choose regions of the graph marked as “roots” of the trajectory. In time series experiments, this can usually be accomplished by finding spots in the UMAP space that are occupied by cells from early time points.
plot_cells(cds,
color_cells_by = "embryo.time.bin",
label_cell_groups = FALSE,
label_leaves = TRUE,
label_branch_points = TRUE,
graph_label_size = 1.5)
The black lines show the structure of the graph. The graph is not
fully connected: cells in different partitions are in distinct
components of the graph.Circles with numbers in them denote special
points within the graph. Each leaf, denoted by light gray circles,
corresponds to different outcomes (cell fates) of the trajectory. Black
circles indicate branch nodes, in which cells can travel to one of
several outcomes. Control whether or not these are shown with
label_leaves and label_branch_points args to
plot_cells
order_cells will calculate where each cell falls in
pseudotime. In order to do so order_cells() needs to have
root nodes of the trajectory graph specified.
# this will not work in interactive mode, either root_pr_nodes or root_cells must be provide
# helper function below provides work around example
# cds <- order_cells(cds)
In this example, just one location was chosen. Plotting the cells and coloring them by pseudotime shows how they were ordered.
# see helper function below for working example
# plot_cells(cds,
# color_cells_by = "pseudotime",
# label_cell_groups = FALSE,
# label_leaves = FALSE,
# label_branch_points = FALSE,
# graph_label_size = 1.5)
Some of the cells are gray, meaning they have infinite pseudotime, because they were not reachable from the root nodes that were picked. Any cell on a partition that lacks a root node will be assigned an infinite pseudotime. In general, choose at least one root per partition.
The helper function below selects the root of a trajectory by first grouping the cells according to which trajectory graph node they are nearest to, then calculating what fraction of the cells at each node come from the earliest time point. It then picks the node that is most heavily occupied by early cells and returns that as the root.
# a helper function to identify the root principal points
get_earliest_principal_node <- function(cds, time_bin="130-170"){
cell_ids <- which(colData(cds)[, "embryo.time.bin"] == time_bin)
closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
root_pr_nodes <-igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names
(which.max(table(closest_vertex[cell_ids,]))))]
root_pr_nodes
}
cds <- order_cells(cds, root_pr_nodes = get_earliest_principal_node(cds))
Passing the programatically selected root node to
order_cells() via the root_pr_node arg.
plot_cells(cds,
color_cells_by = "pseudotime",
label_cell_groups = FALSE,
label_leaves = FALSE,
label_branch_points = FALSE,
graph_label_size = 1.5,
label_principal_points = TRUE)
This could be done on a per-partition basis by first grouping the
cells by partition using partitions(). This would result in
all cells being assigned a finite pseudotime.
It is often to subset cells based in their branch in the trajectory.
The function choose_graph_segments allows you to do so
interactively.
# interactive mode not working
cds_sub <- choose_graph_segments(cds,
starting_pr_node = "Y_118",
ending_pr_nodes = "Y_139")
cds_3d <- reduce_dimension(cds, max_components = 3)
cds_3d <- cluster_cells(cds_3d)
cds_3d <- learn_graph(cds_3d)
##
|
| | 0%
|
|======================================================================| 100%
cds_3d <- order_cells(cds_3d, root_pr_nodes = get_earliest_principal_node(cds))
cds_3d_plot_obj <- plot_cells_3d(cds_3d, color_cells_by = "partition")
cds_3d_plot_obj